serial number
A Multiple-Fill-in-the-Blank Exam Approach for Enhancing Zero-Resource Hallucination Detection in Large Language Models
Munakata, Satoshi, Fukui, Taku, Mohri, Takao
Large language models (LLMs) often fabricate a hallucinatory text. Several methods have been developed to detect such text by semantically comparing it with the multiple versions probabilistically regenerated. However, a significant issue is that if the storyline of each regenerated text changes, the generated texts become incomparable, which worsen detection accuracy. In this paper, we propose a hallucination detection method that incorporates a multiple-fill-in-the-blank exam approach to address this storyline-changing issue. First, our method creates a multiple-fill-in-the-blank exam by masking multiple objects from the original text. Second, prompts an LLM to repeatedly answer this exam. This approach ensures that the storylines of the exam answers align with the original ones. Finally, quantifies the degree of hallucination for each original sentence by scoring the exam answers, considering the potential for \emph{hallucination snowballing} within the original text itself. Experimental results show that our method alone not only outperforms existing methods, but also achieves clearer state-of-the-art performance in the ensembles with existing methods.
Trump assassination attempt: Suspect's possible 'personal vendetta' among investigators' 4 key questions
Now that alleged would-be Trump assassin Ryan Routh is in custody, the FBI and Florida police will have their hands full unraveling his planning process and what may have motivated him. Former NYPD investigator and security expert Patrick Brosnan told Fox News Digital that investigators will need to trawl through a litany of information in the coming weeks, including "all things cellular, online shopping; phone camera images, bank records, email correspondence, recent search engine inquiries, dating app activity, identification of any possible burner phones, footage from โฆ city streets, UPS trucks, Amazon trucks or backup cameras, and all cell tower pings within a fixed distance." Using this information, investigators will build Routh's profile to answer these questions, according to Gene Petrino, a SWAT commander with nearly three decades in law enforcement and a master's degree in security management. Ryan W. Routh, suspected of attempting to assassinate Republican presidential nominee former President Trump at his West Palm Beach golf course, stands handcuffed after his arrest during a traffic stop near Palm City, Florida, Sept. 15, 2024. Petrino said investigators will obtain warrants to scour Routh's social media and speak with his family and associates to determine whether someone else was involved in planning his assassination attempt on Sunday afternoon or anyone who may have trained him beforehand.
DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?
Gu, Zhouhong, Zhang, Lin, Zhu, Xiaoxuan, Chen, Jiangjie, Huang, Wenhao, Zhang, Yikai, Wang, Shusen, Ye, Zheyu, Gao, Yan, Feng, Hongwei, Xiao, Yanghua
Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice questions, with an average of 994 tokens per question. Each question contains an average of 4.55 pieces of implicit evidence, and solving the problem typically requires 7.62 logical jumps to find the correct answer. To enhance the performance of LLMs in evidence detection, this paper proposes Detective Reasoning Prompt and Finetune. Experiments demonstrate that the existing LLMs' abilities to detect evidence in long contexts are far inferior to humans. However, the Detective Reasoning Prompt effectively enhances the capability of powerful LLMs in evidence detection, while the Finetuning method shows significant effects in enhancing the performance of weaker LLMs. Moreover, when the abilities of LLMs in evidence detection are improved, their final reasoning performance is also enhanced accordingly.
SCOTUS to take up challenge to Biden admin's ghost gun rule that group deems 'abusive'
Senate Intelligence Committee member Marco Rubio, R-Fla., tells'Hannity' the idea of citizenship is in danger. The Supreme Court announced Monday that it will hear a challenge to the Biden administration's regulation on so-called "ghost guns" next term. The rule in question was issued in 2022 by the Bureau of Alcohol, Tobacco, Firearms and Explosives (ATF) to regulate "buy build shoot" kits that are available online or in stores that allow any individual to assemble a working firearm without a background check or the usual serial numbers required by the federal government. The Fifth Circuit late last year struck down the rule, but the Justice Department appealed to the Supreme Court. The DOJ argued that the Gun Control Act of 1968 permits the rule because it defines a "firearm" to include "any weaponโฆwhich will or is designed to or may readily be converted to expel a projectile by the action of an explosive," as well as "the frame or receiver of any such weapon."
Surprise, this 30 video doorbell has serious security issues
Video doorbells manufactured by a Chinese company called Eken and sold under different brands for around 30 each come with serious security issues that put their users at risk, according to Consumer Reports. The publication found that these doorbell cameras are sold on popular marketplaces like Walmart, Sears and Amazon, which has even given some of their listings the Amazon Choice badge. They're listed under the brands Eken, Tuck, Fishbot, Rakeblue, Andoe, Gemee and Luckwolf, among others, and they're typically linked to a user's phone via the Aiwit app. Outside the US, the devices are sold on global marketplaces like Shein and Temu. We found them on Chinese website Alibaba and Southeast Asian e-commerce website Lazada, as well.
Natural Strategic Abilities in Voting Protocols
Jamroga, Wojciech, Kurpiewski, Damian, Malvone, Vadim
Security properties are often focused on the technological side of the system. One implicitly assumes that the users will behave in the right way to preserve the property at hand. In real life, this cannot be taken for granted. In particular, security mechanisms that are difficult and costly to use are often ignored by the users, and do not really defend the system against possible attacks. Here, we propose a graded notion of security based on the complexity of the user's strategic behavior. More precisely, we suggest that the level to which a security property $\varphi$ is satisfied can be defined in terms of (a) the complexity of the strategy that the voter needs to execute to make $\varphi$ true, and (b) the resources that the user must employ on the way. The simpler and cheaper to obtain $\varphi$, the higher the degree of security. We demonstrate how the idea works in a case study based on an electronic voting scenario. To this end, we model the vVote implementation of the \Pret voting protocol for coercion-resistant and voter-verifiable elections. Then, we identify "natural" strategies for the voter to obtain receipt-freeness, and measure the voter's effort that they require. We also look at how hard it is for the coercer to compromise the election through a randomization attack.
Automatic Procurement Fraud Detection with Machine Learning
Although procurement fraud is always a critical problem in almost every free market, audit departments still have a strong reliance on reporting from informed sources when detecting them. With our generous cooperator, SF Express, sharing the access to the database related with procurements took place from 2015 to 2017 in their company, our team studies how machine learning techniques could help with the audition of one of the most profound crime among current chinese market, namely procurement frauds. By representing each procurement event as 9 specific features, we construct neural network models to identify suspicious procurements and classify their fraud types. Through testing our models over 50000 samples collected from the procurement database, we have proven that such models -- despite having space for improvements -- are useful in detecting procurement frauds.
Estimator (Statistics)
A simple example of estimators and estimation in practice is the so-called "German Tank Problem" from World War Two. The Allies had no way to know for sure how many tanks the Germans were building every month. By counting the serial numbers of captured or destroyed tanks (the estimand), Allied statisticians created an estimator rule. This equation calculated the maximum possible number of tanks based upon the sequential serial numbers, and apply minimum variance analysis to generate the most likely estimate for how many new tanks German was building.
Deep Serial Number: Computational Watermarking for DNN Intellectual Property Protection
Tang, Ruixiang, Du, Mengnan, Hu, Xia
In this paper, we introduce DSN (Deep Serial Number), a new watermarking approach that can prevent the stolen model from being deployed by unauthorized parties. Recently, watermarking in DNNs has emerged as a new research direction for owners to claim ownership of DNN models. However, the verification schemes of existing watermarking approaches are vulnerable to various watermark attacks. Different from existing work that embeds identification information into DNNs, we explore a new DNN Intellectual Property Protection mechanism that can prevent adversaries from deploying the stolen deep neural networks. Motivated by the success of serial number in protecting conventional software IP, we introduce the first attempt to embed a serial number into DNNs. Specifically, the proposed DSN is implemented in the knowledge distillation framework, where a private teacher DNN is first trained, then its knowledge is distilled and transferred to a series of customized student DNNs. During the distillation process, each customer DNN is augmented with a unique serial number, i.e., an encrypted 0/1 bit trigger pattern. Customer DNN works properly only when a potential customer enters the valid serial number. The embedded serial number could be used as a strong watermark for ownership verification. Experiments on various applications indicate that DSN is effective in terms of preventing unauthorized application while not sacrificing the original DNN performance. The experimental analysis further shows that DSN is resistant to different categories of attacks.
Ring doorbells (second gen) have been catching on fire--Is yours safe?
Ring, a subsidiary of Amazon, recalled 350,000 Ring Video Doorbells (2nd Generation) recently, after 23 of them reportedly started fires, according to a statement by the U.S. Consumer Product Safety Commission. While the recall sounds alarming, if the doorbell was properly installed, there's no risk, according to Ring. Using improper screws during installation can cause the battery to overheat and catch on fire. The recall is for Ring Video Doorbell (2nd Generation), model number 5UM5E5. These units were sold between June and October 2020.